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^3^ Abstract. This paper is concerned with the complex behavior arising in satisfiability problems. 

'^N We present a new statistical physics-based characterization of the satisfiability problem. Specifically, 

^vj we design an algorithm that is able to produce graphs starting from a fc-SAT instance, in order to 

analyze them and show whether a Bose-Einstein condensation occurs. We observe that, analogously 

I I to complex networks, the networks of fc-SAT instances follow Bose statistics and can undergo 

^^-2 Bose-Einstein condensation. In particular, fc-SAT instances move from a fit-get-rich network to a 

I ) winner-takes-all network as the ratio of clauses to variables decreases, and the phase transition of k- 

jy! SAT approximates the critical temperature for the Bose-Einstein condensation. Finally, we employ 

^ the fitness-based classification to enhance SAT solvers (e.g., ChainSAT) and obtain the consistently 

' ' highest performing SAT solver for CNF formulas, and therefore a new class of efficient hardware 

, and software verification tools. 

> 

y—i 1 Introduction 
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^D Satisfiability (SAT) is a famous logical reasoning problem defined in terms of Boolean variables and 

^^ logical constraints (clauses) describing the relation among these variables. Each such variable can be 

^D negated or not, that is, each variable (a literal) can be either True or False; the constraint is built as 

f^ the OR function of the k variables (fc-SAT) [1]. In general, propositional formulas are represented in 

. . Conjunctive Normal Form (CNF). A CNF formula consists of a conjunction of m clauses, each of which 

consists of a disjunction of fc literals. SAT has received a great deal of theoretical and experimental 
study as the paradigmatic A/'P-complete problem as decision problem |2|3j and A/'P-hard as solution 
when there are more than two literals for each clause [IJ. The SAT problem is also crucial for solving 
Cd large-scale computational problems, such as AI planning, scheduling [5], protein structure prediction, 

haplotype inference, circuit-level prediction of crosstalk noise, computer chip verification, termination 
analysis in term-rewrite systems, model checking, and hardware and software verification |6l7j . Indeed, 
most verification tools consist of decision procedures to check the satisfiability of a given formula generated 
by the verification process. As a result, the subject of practical SAT solvers has received considerable 
research attention, and numerous solver algorithms have been proposed and implemented ^8 9 10]. In 
particular, several SAT solvers rely on linear programming [4] or tabu search |11| and have been thoroughly 
analyzed in their worst cases |12| . When we consider randomly generated instances, SAT is called random 
satisfiability problem. The original aim for inspecting random instances of fc-SAT has been the desire to 
decipher the hardness and complexity of typical (standard) instances. For this reason, research works on 
fc-SAT have been focused on developing algorithms for counting the number of solutions |13I14I15] , and 
analyzing their computational complexity 16J. 

The cooperative dynamics of the interacting clauses can exhibit new rich behavior that is not evident in 
the properties of the individual clauses and literals (the elementary units) that make up a SAT formula 
(the many-body system) of a very large numbers of these units. Standard experimental methods for 
studying A/'T'-complete problems use a random generator of the problem instances and an exact (possibly 
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optimized by means of heuristics) algorithm to solve them. By analyzing the results with proper measures 
(e.g., the number of recursive calls), one can obtain important information about the problem, such as 
phase transitions, topological characterization of the search space, and clusters of solutions [17]. During 
the last twenty years, studies in theoretical computer science have exploited new methodologies, based 
on statistical physics and experimental computer science, for investigating the nature and properties of 
TV-P-complete problems |18I19I20| . 

There is a deep connection between A/'T'-complete problems and models studied in statistical physics. 
This connection leads to determining computational complexity from characteristic phase transitions in 
the A:-S AT problem [2] . In Sherrington's work [2Tj , A:-S AT is thought of as an extension of the Sherrington 
and Kirkpatrick's spin glass model [22]. Moreover, its graph structure is an extension of the Erdos-Renyi 
random graphs; in particular, fc-SAT models on Erdos-Renyi graphs showed the existence of free energy 
limits |23| . Although computer programs based on local dynamical algorithms are unable to reach the 
HARD-SAT phase in the neighborhood of the fc-SAT phase transition, spin glasses techniques |^ allow to 
quantify the HARD-SAT region between the SAT and UNSAT ones. Mezard et al. [3j showed the existence 
of an intermediate phase in fc-SAT problems below the phase transition threshold, and a powerful class 
of optimization algorithms was designed and tested successfully on the largest existing benchmark of k- 
SAT. Krzakala et al. ^25 discovered and analyzed four phase transitions in the solution space of random 
fc-SAT. As the constraint density increases, clusters of solutions appear in the solution space; then, 
solutions condense over a few large clusters. These results strengthen the link between computational 
models and properties of physical systems, and offer the possibility of new developments and discoveries 
in this research field. 

The goal of our research is to characterize the condensation phenomenon for fc-SAT problems by 
translating a formula into a graph G = {V,E)^ and then to employ this characterization to improve 
the well-known ChainSAT algorithm •26]. Inspired by Bianconi and Barabasi's research work on Bose- 
Einstein condensation (BEC) in complex networks [27], we design an algorithm that produces graphs 
starting from a fc-SAT instance and associates each clause to a fitness value. The phase diagram of 
the graph provided by the algorithm shows evidence of BEC for low values of the clauses-to-variables 
ratio. The BEC, from the very beginning, was associated to superfluidity: as London stated in 1938, 
"the peculiar phase transition (A point) that liquid helium undergoes at 2.19 K, most probably has to 
be regarded as the condensation phenomenon of the Bose-Einstein statistic" 28J . Hence, superfluidity in 
a fc-SAT formula could be thought of as a consequence of the low constraint density that we flnd in the 
SAT phase. Our results give new hints in understanding the complexity and the structure of a fc-SAT 
instance in phase transition. The graph of a given instance allows us to satisfy it by flnding a truth 
assignment only for the flttest clauses. Our approach makes use of complex networks in order to operate 
on the instance, without requiring a priori investigation of its solutions. 

The rest of the paper is organized as follows. First, we give an overview of the Bose-Einstein distri- 
bution and tailor it to the satisfiability problem by translating a SAT formula into a graph. Then, we 
investigate two variants of our algorithm. We present experimental evidence supporting the hypothesis 
that the phase transition between solvable and unsatisfiable instances of 3-SAT approximates the locus of 
the Bose-Einstein condensation in the phase diagram of 3-SAT formulas. Finally, we show how to improve 
the ChainSAT solver by using our algorithm to provide a clause ordering. 

2 Bose-Einstein Distribution 

The analysis of the state of matter, from a quantum point of view, states that all particles of the same type 
are equal and indistinguishable. Let us consider an isolated system of N identical and indistinguishable 
bosons confined to a space of volume V and sharing a given energy E. These latter are particles that do 
not obey the Pauli exclusion principle, since two or more bosons may have exactly the same quantum 
numbers. We assume that these bosons can be distributed into a set of energy levels, where each level 
Ei is characterized by an energy e^, i.e., the energy of each particle settled on that energy level, and 
a degeneration gi, representing the number of different physical states that can be found at that level. 
Accordingly, the N identical and indistinguishable particles are distributed among the energy levels, and 
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each level Ei contains rii particles, to be accommodated among its gi quantum states. For instance, if 
rii = 2 and gi = 3, the particles a and b can settle on Ei in one of these ways: ab\\ — || — , — ||a6|| — , — 1| — 
\\ab, a\\ — \\b, a||6|| — , — ||a||6. (Permutations of particles must not be included, since a and b are 
indistinguishable. ) 

It is straightforward to check that rii particles may be put on the level Ei (consisting of gi states) in 
[rii + {gi — 1)]! different ways. Since bosons are indistinguishable and the physical states are equivalent, 
the number of possible assignments of rii bosons on Ei is: 

{ui + .g, - 1)! _ frii +gi~ 1\ , , 



"j!(5i - 1)' V "i 

By iterating for all the energy levels Ei, one can observe that a distribution {rii} (i-e., a distribution with 
Hi particles on the level Ei, \fi) can be obtained in 

different ways. In other words, Wi is the number of distinct microstates associated with the i-th level of the 
spectrum, while W is the number of distinct microstates associated with the whole distribution set {iii}. 
The particles distribution corresponding to the statistical equilibrium is the most probable one, thus it is 
the one that may be reached in the largest number of possible ways. Hence, in order to find it, we compute 
the maximum W subject to the conservation of the number of particles ^^ rii — N, and to the preservation 
of the system energy ^^ e^n^ — E. We adopt the method of Lagrange's undetermined multipliers, but 
rather than maximizing W directly, we maximize log W, since log is a monotone transformation. This 
method results in the following condition: 



E 



log - a- pci 



6n, = 0, (3) 



where a and /3 are the Lagrangian undetermined multipliers associated with the two restrictive conditions 
of conservation. Since the variations Srii are completely arbitrary, this condition can be satisfied if and 
only if all their coefficients vanish identically, namely: 

This equality leads to the following definition of Bose-Einstein distribution: 

where a = — |r% and /? = -j^^ are inversely proportional (by means of Boltzmann's constant ks) to the 
absolute temperature T of the system at the equilibrium, and ^c represents the chemical potential. 

Given an ideal Bose-Einstein gas in equilibrium below its transition temperature, the Bose-Einstein 
condensation (BEG) is the property that a finite fraction of particles occupies the lowest energy level. 
According to Penrose and Onsager |29) , we can provide a criterion of BEG for an ideal gas in equilibrium: 
BEG <^=4> -^^ = e^'^^\ No BEG <^=^ -^^ = o(l), where (no) is the average number of particles that 
occupy the lowest energy level Eq. (The first equation is equivalent to ^^^ — constant, but it is weaker 
and easier to apply.) For low values of temperature, i.e., when T — >■ K, the BEG takes place [301 . This 
phenomenon consists of a very unusual state of aggregation of particles, called Bose-Einstein condensate. 
Its characteristic is different from those of the solid state, liquid state, gas and plasma, thus it is known as 
"i/ie fifth state of matter'^ . In particular, below a critical temperature TbeCj all the particles settle on the 
same quantum state and occupy the same energy level. Hence, they are absolutely identical, inasmuch as 
there is no possible measurement that can tell them apart. In other words, they lose their individuality, 
and the single-particle perception is missing. 
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Inspired by Bianconi and Barabasi's work ^\, we provide an algorithm to investigate the BEC phe- 
nomenon in the fc-SAT problem. By translating a SAT formula into a graph, we define the condensation of 
the formula over its fittest clause as the emergence of a star-like topology in the graph. This phenomenon 
is associated with the condensation of bosons on the lowest energy level (see examples in Supporting 
Information) . 

3 The SAT to Graph Algorithm 

An instance of the A:-SAT problem consists of: 

• a set X of variables, with \X\ — n; 

■ a set C of clauses over X, where \C\ = m, such that each clause Ci G C, Vz — l,...,m, has k 

literals and can be written as Q = Li V L2 V ... V L^. Each literal L^ £ L, \/^ — 1, ...,/, where 

L = XUXU {True, False} is the set of literals, \L\ = I. 

The problem is to find a satisfying truth assignment for the following formula: 

F = CiAC2A...ACm- (5) 

The SAT to Graph Transformation Algorithm (S2G) translates a fc-SAT instance into a graph G = 
(V,E), where V are the vertices and E are the edges. A vertex Vi is a clause Ci of the formula F, i.e., 
Vi — v{Gi), whereas each edge ejh represents a relation between two clauses, i.e., ejh — {v(Cj),v{Ch)), as 
we see later. Let us introduce two functions for literals and clauses. Firstly, we define the global frequency 
of literals as: 

ip^{Lfj,) — occurrences of L^ in F, fi = 1, ..., I, (6) 

which reports the frequency of a literal into a fc-SAT formula. Secondly, we define the global fitness of 
clauses as: 

k 

/''(^^) = E^°(^A.)' L^^G,, i = l,...,m, (7) 

which is a fitness function to evaluate clauses and grows with a monotonic behavior with respect to the 
ip^ of its literals. The construction of the graph G = (F, E) is an iterative process in which each clause 
Gi is assigned to a vertex (node) Vi , and edges Cjh are links established according to an affinity function, 
as we see below. Since the construction is dynamical, we need to define the local frequency of literals and 
the local fitness of clauses. While the global ones are determined on the complete formula F, the local 
ones concern only the clauses that have been added as vertices in V using a subset F' of the clauses of 
F . In particular, we define the local frequency of literals as follows: 

(^^(Lp) = occurrences of i^ in F', /i = l,...,L (8) 

Analogously, the local fitness of clauses is defined as: 

k 

/^(Q) = ^¥>^(L,,), i,, eQ, z = l,...,m. (9) 

It is obvious that, at iteration i, a literal i^, has ip^{L^) = in case it belongs to a clause that has not 
been added to V{G) yet; when the algorithm ends, Lp'^{L^) = Lp^{L^j), V/i = 1, .., /. 

Hereinafter we need to suppose that the order of literals in a clause has no importance. However, 
since the OR operator is commutative, it is possible to define a distance metric that states how many 
literals are not in common between two clauses. Let Ci, Cj be two clauses made up of literals U and U 
respectively; we define the following distance: 



d{c,, c,) = \{^le {1, ..., fc} : i; ^ v^] I , (10) 
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which is a metric distance that can be related to the well-known Hamming distance f3P. In Supporting 
Information [A] we prove that d is a metric. 

Let G — (y, E) be the graph obtained at the (i — l)-th iteration, and F' C -F be the temporary fc~SAT 
subformula F' — Ct^ A Ct^ A ... A Cti_^- In order to add a clause Ct^ to G as a node v{Cti), we estimate 
the probability of being connected to a node that already belongs to the graph; this probability must be 
computed for each node (clause) added to G, since it is the criterion to build edges between nodes. We 
define the probability that a new node v{Gt-) is connected to the node v{Ct) € V{G) as: 

77,.^ fe*.-/^(g*.) (11) 

tj \v\ ^ ' 

where kt- = degree(w(Cf .)) is the connectivity of Ct- (i.e., the number of links shared by node v{Ct)), 
and /'"{Ct ) is the fitness of the clause Ct-- This probability distribution ensures that a new vertex is 
likely linked to an existing one with high fitness value or/and high connectivity [57]. We deduce that 
this process brings to a model in which the attractiveness and evolution of a node are determined by its 
fitness and by its number of links. 

In order to assign the new node-clause Gt^ an appropriate number representing an energy level |27j . 
it is necessary to normalize the local fitness values as /^(CtJ = /^(Ct.)//-^(Ct), where Gt is the fittest 
clause in the temporary graph already built using _F'. As a result, as soon as the node v{Gti) enters the 
system, it has the following energy (see [27]): 

e,, =-T.log/,^(C,J, (12) 

where T — i, and /3 is a parameter used to model the temperature of the system. (In this work, when 
comparing two or more energy levels, we omit the multiplicative factor T.) If two different nodes are 
assigned the same energy value in our model, it means (from a physical point of view) that they represent 
two different degeneration states of the same energy level, as shown in Table [l] 

G = (V, E) fc-SAT Statistical physics 

node clause degeneration state of the energy level of the node 

edge link between two clauses one particle for each degeneration state involved 

node weight fitness of a clause value of the energy level 

edge weight probability of being established weight on particles 

Table 1. Dictionary translating the graph (left) into the fc-SAT problem (centre) and statistical physics 
language (right). 



The definition of probabilities and the linking criterion are the building blocks of the S2G algorithm, 
which consists of three main steps. 

Step I. Let yl = 0, T^ = 0, F = 0, and F' = 0. Let i be the index representing the number of the 
iteration. Here, we set i = 1. The first clause Gt^ to add to F' is chosen randomly among the m clauses of 
the given formula F. After computing the local fitness of the clause, we assign to it the normalized local 
fitness /^(CtJ. Since Gt^ is the only clause added to the graph so far, its f^ is set to 1. After that, we 
compute the energy level e^j , which in this case is equal to 0. The variable t is used to store the index of 
the fittest clause. Obviously, at the first iteration, it must be set to ii. The pseudo-code of the first step 
is presented in Algorithm |S1| 
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Step II. Successively, we perform another step of the algorithm, in order to establish the first link 



between two clauses, as shown in Algorithm S4 This step and the following ones include two procedures, 
shown in Algorithm |S2| and Algorithm |S3[ The second clause is chosen such that it is the closest to 
C(j, in terms of the distance defined in ( |10[ ). If two or more clauses have the same minimum distance 
from Ct-^ , then a random clause is chosen among them. Notice that at every iteration i all the local 
frequencies of literals are updated, therefore the local fitness and the energy level of clauses are updated 
as well. We perform the normalization of the fitness in order to obtain a non- negative energy level. Indeed, 
the logarithm function, when its base is greater than 1 and its argument belongs to the interval ]0; 1], 
returns a non-positive value; since the absolute temperature T is non-negative, the energy level becomes 
a non-negative value, as expected. 

Step III (general step). The main loop of the S2G algorithm shown in Algorithm fl] will be performed 
after a link is established. The purpose, as in the previous step, is to choose an index ti such that the Cj. 
is the clause closest to the clause with highest fitness among those that are in the network so far (after 
the (i — l)-th step). For each established link, we put a particle on each of the two degeneration states of 
the two clauses involved. Moreover, the probability of establishing a link becomes the weight on the edge 
representing that link. The general step differs from the second step because it needs at least one edge in 
the graph to work properly. This prerequisite allows us to have at least two non-zero vertex connectivities. 



permitting to compute the probabilities lit , since the denominator in (11) is surely nonzero. This is the 
reason why in Step II we forced Ct^ and Ct^ to link together. 



Algorithm 1 S2G Algorithm 



1: Selecting_the_First_Clause-Node() 

2 : Connect ing_First _two_Clauses-Nodes ( ) 
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9 

10 

11 



while i < rn do 

z -S— j + 1 

Find_Closest_Clause() 
for j <— 1 to i — 1 do 



Ht 



i-l 



E^'^-/^(c*. 



i/=i 



try to connect v(Cti) to v{Ctj) with probability Utj 

kt- <r- degree(u(Ct .)) /* update connectivity of node v{Ct) */ 

end for 
kti ■<— degree(w(Cti)) /* update connectivity of node v{Cti) */ 



12: Update_Fitness() 
13: end while 



The S2G algorithm is based on a probabilistic approach, which could even lead to an unexpected network. 
According to this process, the graph is built in such a way as to involve dynamical energy levels, i.e., the 
numerical value of each energy level changes at each iteration, due to the dynamical changes of the local 
frequencies. 

The first clause added to the graph, i.e. Ct^, could be chosen differently. For instance, the f^iCi) 
could be taken into account in the clause choice, and the global fittest clause would be then selected as 
first clause in G. In this way, the first mover advantage principle is emphasized, since the first clause is 
also the fittest one, therefore it is easier for it to acquire the majority of the links of the whole network. 
It follows that this technique would lead to more BEG networks but prejudicing the unpredictability of 
the overall process. 
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When the graph is completed, we consider the connectivity of the richest node (the node that has the 
maximum number of hnks) in order to decide whether a Bose-Einstein condensation has occurred. If the 
connectivity is large enough (the thresholds have been determined experimentally, see Working hypothesis 
I2]), we say that a BEC has taken place in the graph, i.e., one node has a huge fraction of edges and the 
remaining fraction is shared among all the other nodes. If the graph does not show any condensation, we 
compute the degree distribution in order to understand what kind of network has developed. Moreover, 
we compute the mean and the standard deviation of all nodes except the winner (i.e., the richest), so as 
to obtain simple statistics involving the rest of the degree distribution. 

The computational complexity of our algorithm is polynomial. The Step II procedure has a complexity 
of 0{N^), where TV — max{n, m, k}, due to the subprocedure that computes the distance d between the 
clauses eligible to join the graph and the fittest clause already added to it. The main loop of the S2G 
algorithm has 0{N'^) time complexity, since it consists of the Step II procedure applied (with slight 
modifications) to all of the remaining clauses of the fc-SAT formula. 

4 Fitness-Based Preferential Attachment 

In this section we extend the S2G algorithm by including the concept of preferential attachment, thus 
obtaining a new algorithm called S2G-PA. Even this model starts with two nodes connected by an edge. 
Exactly as in the previous model, at each iteration a new node is added to the graph. The preferential 
attachment implemented in the new algorithm is based on the same principle of the algorithm used so far: 
if we consider a single node of the network, the probability of acquiring new edges is positively correlated 
with its degree. According to the previous section, in the fitness-based model the connectivity is not 
the only parameter taken into account, but also the fitness plays an important role in computing the 
probability of acquiring new edges. 

The main difference between this model and the model presented before consists of the preferential 
out-degree (p), a technique applicable to directed graphs. At each iteration i, the node that joins the graph 
is forced to connect at most to p existing nodes and at least to one node. Recall that in the previous model 
there were no restrictions to the number of outgoing links (od(i')) that a node could have. It follows that 
a number of nodes, when they joined the existing network, did not link to any other node of the graph. 
This caused the probability 77 (probability of linking a new node to them) to remain always 0, therefore 
their degree remained equal to during the whole process, i.e., they never linked to the main connected 
component of the graph. On the contrary, the new algorithm ensures that all the nodes will be part of 
the network, i.e., all the nodes will have at least one link and G has only one connected component. The 
output networks of S2G and S2G-PA can be compared in Supporting Information [C] and Supporting 
Information ID] When the most connected nodes have the highest number of particles, and the winner 
node is identified with the lowest energy level, we obtain a clear "signature" of BEC in a preferential 
attachment scheme with fitness, as proved by Borgs et al. [32]. These facts help us confirm that when the 
BEC occurs there is a clear mapping between the Bose gas and the graph derived by the S2G algorithm. 

The preferential attachment ensures that the condition 1 < od{v{Ct-)) < p, Vi = l,...,m holds at 
each iteration, where od is the node out-degree. In practice, our new algorithm sets the out-degree to 
p, but two or more links may be directed to the same node, depending on the probabilities computed 
(nevertheless, multiple links are considered as simple ones). Hence, the resulting out-degree of the new 
node may be less than p, but it is always > 1. Conversely, the in-degree has no restrictions. Generally, 
the standard preferential attachment leads to random scale-free Barabasi and Albert networks [33] j in 
which the distribution of degree decreases with a power law, that is reduced to a line in logarithmic scale. 
In our case, the preferential attachment is accompanied by a fitness function (that is why the algorithm 
has been called fitness based preferential attachment), so the resulting network is not exactly a scale- free 
network. Furthermore, the new model causes competition among nodes |34j . Indeed, a new node has a 
fixed number p of links available, therefore the old nodes have to compete to acquire one link from the 
new node. This competition gets more and more challenging as the graph increases, since the number of 
nodes increases but the number of available links from a new node remains the same. It is evident that 
the resulting network obeys the widely known first mover advantage principle '22] , according to which the 
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first nodes of the graph have more time to gain links than the last ones. Finally, our fitness-based model 
ensures a lot of unpredictability to the system, since the fitness of each node changes at each iteration, as 
explained before. Thanks to this mechanism, a node with high fitness may get into the graph at a later 
time and become richer and richer till it overcomes the richest nodes. On the other hand, once that a 
node has entered the graph, its fitness may remain the same in the iterations after, thus other nodes may 
overcome it. These features lead to a dynamic and erratic evolution of the network. 

5 Non-Integer Out-Degree 

Let us consider the i-th iteration of the graph generation, when the node w(CtJ is added to the network. 
Suppose that, according to the probabilities 71, the new node v{Cti) must be linked to the existing node 
v{Ct). In this section, we make the following hypothesis. 

Working hypothesis 1. An outgoing link is less important than an incoming link, that is, the incoming 
links are rewarded more than the outgoing links. 

This hypothesis implies that our graph must be regarded as a directed graph in order to maintain the 
correspondence between a fc-SAT instance and its graph, as well as to distinguish between outgoing and 
incoming links. According to the Google-like reference [31], the same edge between the new node f (CtJ 
and the existing node v{Ct) does not increase their connectivity kt^ and kt (respectively) in the same 
way (see Figure [l]). 




'^'Ks 




Ct, 



kti <- kt, + e, where < 61 < 1 



Fig. 1. Link between Ct- and Ct ■ The dashed line represents the non-integer out-degree 9 of Ct-, while 
the continuous line represents the integer in-degree of C^ . . 



Nevertheless, we continue to represent our graph as an undirected graph, making use of the relation 
ki = 9 ■ 0(i{v{Ci)) -\- id(i;(Ci)), where od and id are the node out-degree and in-degree respectively. It is 
evident that a non-integer connectivity (i.e., a non-integer degree) leads to a new kind of evolution of the 
network. In this new model, nodes aim to connect to a particular node in the network, and when they 
manage to connect to it, that node gets richer and richer more rapidly than in the previous models. In fact, 
as incoming links are rewarded more than outgoing links, the connectivity of the node that acquires links 
raises much more than the connectivity of the nodes linking to it. We set 9 — 0.33 so that an outgoing 
link is rewarded a third of an incoming link. The plot in Figure [2] has been obtained by fixing the number 
of variables n = 100 and letting the number of clauses m vary from to 1000, so a = — (number of 
clauses over number of variables) varies from to 10. The plot depicts the relationship between a and the 
percentage of each of the three classes of network returned by our algorithm, according to the following 
hypothesis. 

Working hypothesis 2. Let us call fraction-winner f the percentage of links acquired by the winner 
node over the whole set of links. We say that: 

■ a Fit-get-rich topology takes place when f < 0.75; 

• a Partial BEC takes place when 0.75 < / < 0.90; 

• a Full BEC takes place when f > 0.90. 



I^it-get-rich' 

Partial BEC 

Full BEC 

Fit Fit-get-ricii 

Fit Partial BEC 

Fit Full BEC 




Fig. 2. Percentage of each kind of network against ratio of clauses to variables, with fixed number of 
variables n — 100 and non-integer out-degree 9 = 0.33. The lines represent a sixth-order polynomial 
regression to fit the data. Full BEC occurs when the winner node gets greater or equal to 90% of links of 
the network. Partial BEC occurs when the winner node gets greater or equal to 75% and less than 90% 
of links of the network. If the percentage is less than 75%, the network has a Fit-get-rich topology. 



As shown in Figure ^ we obtain a large number of networks that can undergo the full BEC (in which 
one node is an evident "winner" node) . Figure [2] also shows that the number of BEC networks produced 
by our algorithm increases as a decreases. 



In Algorithm [2] we show the whole fitness-based preferential attachment algorithm with non-integer 
out-degree (S2G-PA). The preferential attachment technique replaces the lit attachment scheme used 
in S2G. The S2G-PA algorithm uses the cumulative distribution function to ensure that, for each node, 
the probability of acquiring new links is directly proportional to its U . Compared to the S2G model, the 
S2G-PA ensures that all the nodes have a nonzero connectivity, and also that an existing node-clause 
v{Cj) with high fitness and connectivity (i.e., with high Ut) has a higher probability of acquiring links 
from new nodes, inasmuch as the non-integer out-degree method emphasizes this behavior. Instructions 



16 and 17 show how the non-integer out-degree has been implemented in our S2G-PA algorithm. 
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Algorithm 2 S2G-PA Algorithm 



1: Selecting_the_First_Clause-Node() 
2: Connecting Jirst_two_Clauses-Nodes() 



8 

9 

10 

11 

12 
13 
14 
15 
16 
17 
18 
19 



while i < m do 

i <— i + 1 

Find_Closest_CIause() 
for j <~ 1 to i ~ 1 do 



Ht 



-1 



IIcumtQ = 

for J = 1 to i — 1 do 

Ucumtj = ncumtj_i + Utj /* compute cumulate probabilities */ 
end for 

for z — 1 to p do 

X <~ randomQO; 1]) 

find j £ {1, ...,i — 1} such that U curat _^ < x < Ucunit 
connect v{Cti) ^o ^(C'tj) 

fctj <— fctj + 1 /* update connectivity of node v{Ctj) * / 

kti ■'r- kt- + 6 /* update connectivity of node v{C't-) */ 

end for 
end for 



20: Update_Fitness() 
21: end while 



6 S2G-driven SAT Solvers 

Using the information provided by the S2G algorithm, in this section we show the improvement obtained 
in the performance of the ChainSAT algorithm 1261 . The S2G algorithm assigns an energy value to each 
clause of a fc-SAT random instance. As seen before, the fitness value of a clause is negatively correlated 
with its energy, and positively correlated both with the probability of having a high connectivity in the 
network and with the probability that its literals are frequently occurring in the instance. Thus, the 
probability of satisfying all the linked clauses by assigning truth values only to one of them is larger if 
we assign truth values to one with the lowest energy value. Consequently, in order to solve an instance 
we order the clauses by energy level. If we find two or more clauses having the same energy, we put first 
the one with the largest connectivity in the graph provided by the S2G algorithm. If they have also the 
same connectivity, then we order them randomly. As a result, an order is established among clauses of 
a random fc-SAT instance. In the following we refer to the order of the clauses as their "weight" . In 
particular, the heaviest clause will be the one on the lowest energy level. 

ChainSAT j26) is a heuristic that never moves up in energy, since the number of unsatisfied clauses is 
a non-increasing function of the sequence of trial configurations traversed by the algorithm. For fc = 4, 
fc = 5, and fc = 6, ChainSAT is shown to solve random fc~SAT problems almost surely in time linear 
in the number of variables. The ChainSAT algorithm, given in pseudo-code in Algorithm |S5[ (i) never 
increases the energy of the current configuration S, and (m) exercises circumspection in decreasing the 
energy. The ChainSAT algorithm has two adjustable parameters: pi for controlling the rate of descent (by 
accepting energy-lowering flips), and P2 for limiting the length of the chains, in order to avoid looping. 
In our experiments, we set pi = P2 PB] . 

We present two new versions of the ChainSAT algorithm. In these new versions we replace the random 



choice of clauses (see lines 5 and 18 of Algorithm S5 ) with an ordered one. Since ChainSAT is based on 
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a non-increasing energy principle [55], and given the energy levels provided by the S2G algorithm, our 
idea is to select clauses with minimum energy even when ChainSAT performs a random selection. 

Let us introduce a set A = {ai, 02, ..., am}, where m is the number of clauses of the fc-SAT formula. 
The set A is used to record which clauses of the fc-SAT instance have already been chosen, so that loops 
(consisting of choosing the same clause repeatedly) are avoided. Let H = {Ci, C2, ..., Cr} be the set of 



clauses among which the algorithm chooses (line 5 or 18 of Algorithm S5). We suppose that these clauses 
are arranged in a decreasing order of weight. In particular, Ci is one of the heaviest clauses in H, i.e., 
Ci is one of the clauses of H with the lowest energy. The steps for selecting a clause in the line 5 or 18 
of Algorithm |S5| are the following. Initially we set Ui — 0,y i = 1, ..., m. At each step we require that the 
algorithm chooses the heaviest clause Ci among those in H such that a^ = 0. Every time a clause Ci is 
chosen, we set a^ = 1. Step by step, the number of elements in A equal to 1 increases. When a, = 1, V C, 
e _ff , then the clause is chosen randomly. This random choice is necessary to prevent that our algorithm 
always analyzes the same chains clause-variable-clause. 

Our modified version of ChainSAT presented so far, selects the new clause using the same set of 
elements a.; = 0, V i = 1, ..., m, both for the satisfied and for the unsatisfied clauses. We call this version 
LC-ChainSAT, where LC stands for "Linked Clauses" , since the choice of a clause in lines 5 and 18 of 
Algorithm |S5| is based on the same set A. 

We also present a second new version of the ChainSAT algorithm, called NLC-ChainSAT, where NLC 
stands for "Non Linked Clauses" . This new version differs from the first one because we replace A with 
two sets Asat and Aunsat, with the same structure of A. We use Aunsat to store the clauses chosen (as 



not satisfied) by line 5 of Algorithm S5 and Asat to store the ones chosen (as satisfied) by line 18. The 
new algorithm runs exactly like the previous one but when it must select a new clause, it examines the 
set Aunsat or Asat depending on whether the new clause is chosen by line 5 or by line 18 respectively. 

7 Experimental Results 

In this section we investigate the outcomes of our algorithms. First, we give numerical evidence of the 
presence of Bose-Einstein condensation in the A:-SAT problem, focusing on the phase transition region. 
We evaluate the phase diagram of the S2G algorithm to show the transition between a fit-get-rich phase 
and a winner-takes-all phase. Second, we analyze the SAT solvers proposed above by evaluating their 
performance on both random and real-life SAT instances. 

7.1 S2G Results 

For the 3-SAT problem there is strong evidence [2] that the phase transition between solvable and 
unsatisfiable instances is located at a = 4.256, where a = — is the ratio between the number of clauses m 
and the number of variables n. For our experiments we use the A. van Gelder's fc-SAT instance generator 
MKCNF.q^ We asked the program to generate uniformly satisfiable and unsatisfiable formulas to obtain 
a purely uniform random k-SAT distribution. For our experimental protocol, we consider a g]0, 10] and 
n G {10, 25, 50, 75, 100}. For each value of a, we consider 100 formulas and perform 30 independent graph 
C constructions per formula. We make use of the S2G-PA algorithm by imposing 6 = 0.33 and p = I. 

In Figure [3] we plot the percentage of BEC networks observed by varying a. In general, when a < 3 
the resulting graph most likely undergoes a clear Bose-Einstein condensation; in this phase, the fittest 
clause maintains a large number of links even if the graph expands. Moreover, if a increases and enters 
the phase transition region, it is evident that the drop in the number of Bose-Einstein condensations 
becomes smoother. This different behavior seems to match with the increasing complexity of formulas 
with a « 4.256 (the locus of the phase transition for 3-SAT), and therefore we investigate it more 



Available at 



ftp://dimacs.rutgers.edu/pub/challenge/satisfiabiIity/contributed/UCSC/instances 

The program mkcnf.C takes four inputs: r, the random seed; k, the number of literals in each clause; n, the 

number of Boolean variables; m, the number of clauses. 
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a=4.256 



1 Variables 
25 Variables 
50 Variables 
75 Variables 
100 Variables 

Fit 10 Variables 
Fit 25 Variables 
Fit 50 Variables 
Fit 75 Variables 
Fit 100 Variables 




Fig. 3. Bosc-Einstein condensation (BEC) in 3-SAT. We report on tire x-axis the ratio a of clauses to 
variables, and on the y-axis the percentage of BEC networks found. The points have been fitted through a 
sixth-order polynomial regression. The gray stripe shows the region where the critical temperature Tbec 
for Bose-Einstein condensation could be located. 



thoroughly later. For a > 5, the graph shows a fit-get- rich behavior, i.e., there is an increasing number 
of fittest nodes (clauses), but there is no more a unique winner node. This behavior remarks that, for 
increasing a values (close to the UNSAT phase of 3~SAT), we have to find a truth assignment for 
many clauses to obtain a satisfiable formula. These empirical evidences are consistent with the transition 
between SAT and UNSAT instances ^B;. 

In order to evaluate the way in which a (i.e., the ratio clauses to variables) influences the evolution of 
the graph associated with fc-SAT instances, we examine the fraction winner f defined as the ratio of the 
number of links shared by the winner (i.e., the highest degree node) to the number of links of the whole 
graph. Figure |4] shows how the fraction winner varies as function of the ratio of clauses to variables. We 
let the number of variables vary in the set {10, 25, 50, 75, 100}. Each point of the plot has been computed 
by averaging over 1000 different 3-SAT instances, with 30 graph generations per instance. The plot shows 
that the fraction- winner / decreases with a. When a 3-SAT instance is satisfiable (with high probability), 
the S2G-PA algorithm produces a graph condensed over the fittest clause. Conversely, when a 3-SAT 
instance is unsatisfiable (with high probability), its graph exhibits a winner node incapable of maintaining 
the whole connectivity of the network, and some hubs appear and grow following the fit-get-rich model. 
By looking at the plots in Figure |4] from right to left, one can observe that when a becomes smaller than 
the critical value 4.256, the winner node holds the vast majority of links. In this case, the Bose-Einstein 
condensation takes place regardless of the number of variables. Moreover, the plot concerning the case of 
50 variables clearly shows a smooth drop for 4 < a < 5, indicating that the 50- variables graphs undergo 
the slowest Bose-Einstein condensation (provided that a decreases). It is possible to note that for a < 1.1 
the fraction winner is equal to 1, since the winner node holds all the links in the network, i.e., all the edges 
have the winner node as a vertex. Our results suggest that this phase, called winner-takes- all (WTA), 
starts at a = 1 for large values of n. 

One can notice that below the phase transition region the slopes of the plots exhibit a different behavior 
than in the other regions. Specifically, below the phase transition of 3-SAT, located at a = 4.256, the 
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Fig. 4. Phase diagram of 3-SAT. We report tiie fraction of links shared by the winner against a (the ratio 
of clauses to variables). Each point is an average over 1000 3-SAT instances with 30 graphs per instance. 
We have performed a sixth-order polynomial regression to fit the data. Satisfiable instances (with high 
probability) belong to the winner-takes-all phase. Unsatisfiablc instances (with high probability) belong 
to the fit-get-rich phase. The critical temperature Tbec for Bosc-Einstein condensation could be located 
in the grey area in the neighborhood of the SAT-UNSAT phase transition a = 4.256. Below the critical 
temperature, the fraction winner increases at a higher rate. 



fraction winner increases at a higher rate. In order to evaluate the slopes, in Figure |5] we plot the second 
derivative of the polynomial curves of Figure [4] as function of a. A high value of the second derivative 
indicates a rapid change of the fraction-winner slope. For 25, 50, and 75 variables, the 3-SAT phase 
transition found by Mezard et al. [3^ approximates the local maximum of the fraction-winner second 
derivative. This local maximum represents the value of a corresponding to the most rapid change in 
the fraction-winner slope in the neighborhood of 4.256. Therefore, the phase transition of 3-SAT seems 
to be the critical temperature for Bose-Einstein condensation. These outcomes confirm the experimental 
findings above-mentioned, and are consistent with those referring to the plots in Figure|3] The behavior of 
the plot of 100 variables, slightly diiferent from the others, is due to the unexpected values of the fraction 
winner obtained as a approaches 10, which cause the sixth-order curve to exhibit a high curvature in the 
neighborhoods of a = 2.6 and a = 8. 

In Figure [6] we plot the mean of the connectivity of the network nodes, computed on the degree 
distribution without considering the winner node. As previously discussed, for increasing a the winner 
node decreases its connectivity, therefore the other nodes acquire links. In the inset, we plot the standard 
deviation of the 50-variables degree distribution (the plots concerning 10 and 100 variables are shown in 



Figure S4 in Supporting Information M . High standard deviation indicates that the connectivity values 



are scattered, hence the network exhibits highly connected hubs. More precisely, instances with high 
constraint density a not only have the winner node less rich than low constraint density instances, but 
also show higher spreading in the non-winner node connectivity. In other words, the number of hubs is 
positively correlated with a, and this result is consistent with the plot in Figure [3j which shows that 
the number of condensed network decreases as a increases. Remarkably, the rate at which the standard 
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Fig. 5. 3-SAT Bose-Einstein condensation locus. We plot the second derivative of the fraction winner 
as function of the ratio a of clauses to variables. The SAT-UNSAT phase transition a = 4.256 is near 
the local maximum of the second derivative, and therefore corresponds to a quick change of the fraction- 
winner slope. In other words, the 3~SAT phase transition acts as the BEC critical temperature. 
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deviation increases is higher to the left of the Bose-Einstein condensation region. The growing hubs of 
typical S2G-PA output networks are shown in Figure S5 in Supporting Information [F] 
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Fig. 6. Average non-winner connectivity in 3-SAT networks. We report the mean of the connectivity of 
all the nodes except the winner, as function of the clauses-to-variables ratio. Each point is an average 
over 100 3-SAT instances with 30 graphs per instance. We have performed a sixth-order polynomial 
regression to fit the data. In the inset, we report the standard deviation of the mean connectivity for 50- 
variables instances. Satisfiable instances (with high probability) are translated into condensed graphs, as 
all the connectivities are equal to Q and the standard deviation is zero. Conversely, unsatisfiable instances 
(with high probability) are translated into fit-get-rich networks with high standard deviation, thus with 
emerging hubs. In agreement with the fraction winner, to the left of the condensation area of Figure |4] 
both the mean and the standard deviation exhibit a higher slope. 



7.2 LC-ChainSAT and NLC-ChainSAT Results 

We evaluate each algorithm on a collection of 6885 fc-SAT instances obtained from publicly available 
sources. This benchmark consists of (i) 40 Intel sequential circuits and 95 12s benchmarks used in the 
2007 and 2008 hardware model checking competition [36], (ii) 2250 random instances for each value of 
fc (fc = 3, fc = 4, and fc = 5), generated uniformly at random using the A. van Gelder's generator. We 
use AIGTOCNF [37] to convert instances from AIG format to CNF. Then, we convert them into 3-CNF 
instances. We set n e {25, 50, 75, 100, 125} and m such that a = ^ e [asot(^) — 4; asatik) + 2], where 
asat{k) has the estimated values asat{3) — 4.256, asai(4) = 9.931, asat(5) = 21.117 (see [38]). For each 
value of a, we generate 30 different A:-SAT instances. We also introduce the following stop criterion [2]: we 
stop the algorithm either when a solution is found or when 10^ cycles of the main body of the algorithm 
(i.e., 10^ formula evaluations) have been carried out. 

The comparison between ChainSAT and our two modified versions is based on the following definition. 
Let Zi and Z2 be two algorithms tested on the same set of instances. We say that Zi performs better than 
Z2 if one of the following conditions is met: (i) Z\ satisfies more instances than Z2; (ii) both Z\ and Zi 
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Solver 


Solved 


MaxSAT 


Flips 






ChainSAT 


2117 


38633.57 


178793 


fc = 


= 3 


LC-ChainSAT 


2129 


38646.77 


179421 






NLC-ChainSAT 


2132 


38647.47 


178543 






ChainSAT 


2089 


84043.90 


11379888 


fc = 


= 4 


LC-ChainSAT 


2103 


84054.10 


11389576 






NLC-ChainSAT 


2104 


84053.92 


11383184 






ChainSAT 


2047 


166720.88 


16210343 


fc = 


= 5 


LC-ChainSAT 


2057 


166726.64 


16206307 






NLC-ChainSAT 


2055 


166725.50 


16254044 



Table 2. Summary of SAT solvers performance. Both LC-ChainSAT and NLC-ChainSAT outperform 
ChainSAT in terms of clauses satisfied by the algorithm. For k = A, although ChainSAT performs better 
than our modified versions in terms of number of flips carried out, it does not maximize the number of 
satisfied clauses. 



satisfy the same number of instances, but the average number of clauses satisfied by Zi is greater than 
the average number of clauses satisfied by Z2] (Hi) both Zi and Z2 satisfy the same number of instances 
with the same average number of clauses satisfied, but Zi performs less flips than Z2. The parameters of 
ChainSAT have been chosen to be small enough to work at least up to the "dynamical transition" [25] : 
we have set pi = p2 = 0.005 (fc = 3), 0.0001 (fc = 4), and 0.0002 {k = 5) ,26,. 

The analysis of LC-ChainSAT and NLC-ChainSAT shows an improvement in the performance of 
3-SAT, 4-SAT, and 5-SAT solvers. In particular, LC-ChainSAT performs better than ChainSAT in 
56.8%, 57.3% and 60% of the benchmarks using 3-SAT, 4-SAT, and 5-SAT instances respectively. Like- 
wise, NLC-ChainSAT performs better than ChainSAT in 58.3%, 58.7%, and 54.1% of the benchmarks 
respectively. A more detailed analysis of the data is shown in Table [2J For each algorithm we report: (i) 
the number of instances satisfied; (ii) the average number of clauses satisfied in the whole set of instances 
(see the MaxSAT comparison in Figure^; (iii) the number of flips obtained running the algorithm on 
the whole set of instances (see Figure Isl). The MaxSAT problem consists of determining a truth assign- 

we 



ment that maximizes the number of clauses satisfied |39| . In order to confirm our results, in Table SI 
compare LC-ChainSAT and ChainSAT on further 171 instances [3S]. We obtain another confirmation of 
our results if we run the algorithms with stop criterion set as 10** formula evaluations. In this case, the 
number of satisfied clauses is almost equal to zero for all a values, due to the descent circumspect that 
characterizes the ChainSAT algorithm. Thus, by comparing the percentage of the clauses satisfied, both 
of our modified algorithms are able to satisfy more clauses than the ChainSAT, though all algorithms 
evaluate each instance the same number of times (lO** times at most). 

Even if we are not yet able to establish which of the two versions is the best, our results point out 
that ordering clauses with the information provided by the S2G algorithm results in an improvement of 
the ChainSAT performance. 

8 Discussion 

Our work, using a mapping between the fc-SAT problem and the Bose gas, shows numerical evidence 
for Bose-Einstein condensation in a network model for /c-SAT. Analogously to complex networks [37], 
the graphs of fc-SAT instances follow Bose statistics and can undergo Bose-Einstein condensation. It is 
evident that the total number of links shared by the most connected node (also called winner node) varies 
as function of the ratio a of clauses to variables. In particular, the fraction winner, plotted as function of 
a, indicates the difference between the Bose-Einstein phase and the fit-get-rich phase. When a < 1.1, the 
winner node shares all the edges of the graph (winner-takes-all phase). For low a values, the fittest node 
maintains a finite fraction of links even though the number of variables increases (Bose-Einstein phase); 
for high a values, the fraction of links connected to the winner decreases with increasing a (fit-get-rich 
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Fig. 7. MaxSAT for fc = 3, fc = 4, and fc = 5. These plots show the percentage of clauses satisfied by 
LC-ChainSAT and ChainSAT as function of the number of clauses m and variables n. Remarkably, when 
solving 3-SAT instances, LC-ChainSAT clearly outperforms ChainSAT. 
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Fig. 8. Computational effort for k — 5. We plot the number of flips (normalized to 1) performed by the 
two algorithms. LC-ChainSAT improves ChainSAT employing almost the same numbers of flips, therefore 
requiring the same computational effort. 

phase). Moreover, the mean and the standard deviation of the non-winner degree distribution increase 
with increasing a, as growing hubs appear in the graph. 

It is well known that the phase transition of 3-SAT occurs when the ratio a of clauses to variables 
belongs to a neighborhood of 4.256. In our work we experimentally proved that the critical temperature 
for Bose-Einstein condensation in a A:-SAT graph also belongs to the same neighborhood. This fact allows 
us to draw an important conclusion: there is a strict correspondence between the phase transition of k- 
SAT and the critical temperature for Bose-Einstein condensation. To our knowledge, this is the first 
time that complex networks and Bose-Einstein condensation are related to the fc-SAT problem without 
a priori examination of its truth assignments. 

We have also presented a hybrid SAT solver that combines the ChainSAT algorithm and the infor- 
mation provided by the S2G algorithm. Our approach is based on the analysis of the energy level related 
to each clause. We demonstrate that by ordering clauses according to their energy we outperform one 
of the best SAT solvers (ChainSAT, see results [2S]) on the majority of the benchmarks. This means 
that we enhance an algorithm that is able to solve fc-SAT problems almost surely in time linear in the 
number of variables. Hence, our algorithms could also be a good tool from an application point of view, 
e.g., checking satisfiability of formulas in hardware and software verification. 
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Supporting Information 

A 

Theorem 1. d is a metric. 

Proof. Let X = {Xi, ...,Xk), Y — {Yi, ...,Yk), Z = {Zi, ...,Zk) be three clauses, each made up of k 
htcrals. In this proof, we assume that hteral permutations are always possible in every clause, so we do 
not make any assumption on the order of the literals in a clause. Indeed, this hypothesis comes from 
the fact that, in k-SAT instances, the OR operator is commutative. Let us verify the three conditions 
required for a metric. 

1. Definite positiveness. 
d{X,Y) >0, yX,Y; 

d{X,Y) = ^^ I {^ e {1, ..., k} : (X^) ^ (Y^)} \=O^^X^ = Y^,y^i^l, ..., k^^X = Y. 

2. Symmetry. 

d(X,Y) = I {a. G {1, ..., fc} ■.X^^Y^}\ = \{n^ {1, ..., k}:Y^^ X^} \ = d{Y,X). 

3. Triangle inequality. 

Let u{X, Y) be the function that gives back the number of literals that are in common between two 

clauses. We need to prove one of the following inequalities (which are equivalent to one another): 

d{X, Y) < d{X, Z) + d{Z, Y) <=^ k - u{X, Y)<k- u{X, Z) + k- u{Z, Y) ^=> u{X, Y) > u{X, Z) + 

u{Z,Y) -k. 

Let us define Uxy as the number of places (ranging from 1 to fc) in which X = Y .\X is clear that 

I Uxy I = u(-'^, Y) and Uxy 2 Uxz H Uzy due to the transitivity of equality. Hence, if we use the 

dimension theorem for vector spaces, it follows that: 

dim(C/xy) > dim(C/xz n Uzy) = dim(C/xz) + ^vccl{JJzy) - dim(C/xz + Uzy)- 

But we have surely dim(t/xz + Uzy) "£ k. So after replacing the dimensions with the cardinalities of 

sets, we finally obtain: 

u{X,Y)>u{X,Z) + u{Z,Y)-k. D 
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B Supporting Algorithms and Tables 



Algorithm SI (Step I) - Selecting_the_First_Clause-Node 



1: A^V^E^F'^ik 

2: i^l 

3: ii <— randoiii({l, ...,m}) 

4: F'^F'u{CtJ 

5: A^ Au{L^:Lf,e CtJ 

6: V ^VU{v{CtJ} 

7: for Lfj, e A do 

8: if^{L^) <— occurrences of L^j in F' /* compute local frequency of L^ * / 

9: end for 

k 

10: /^(CtJ <- ^ f^iLf,), Lf, G Ct. /* compute local fitness of Ct,*/ 

M = l 

11: fr{Cti) ^—1 /* set initial normalized local fitness */ 

12: 
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eti -s T ■ log/r (CtJ /* set initial energy level */ 

t <— ti /* index of the fittest clause */ 



Algorithm S2 (Procedure I) - Find_Closest_Clause 



U <r- ti e {1, ...,m} \ {ti,t2, ...,ti-i} such that 

d{Cti,Ct) — rmn{d{Cti,Ct) \ U € {1, ...,Tn}\{ti,t2, ...yti-i}} /* i has been computed 

in the previous step */ 



if 3 two or more clauses with minimum distance then 

ti <— random(onc of the two or more clauses with minimum distance) 
end if 

F' ^ F'u{Ct,} 

A^ Au{L^:Lf,eCti} 

V<^Vu{v{CtJ} 
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Algorithm S3 (Procedure II) - Update_Fitness 



for Lp G /I do 

ip^{L^) <— occurrences of L^ in F' /* update local frequency of L^ */ 

end for 



4: for J <— 1 to i do 

k 

5: f^{Ct^) ^ Y. "^^(-^m), L^ e Ct, /* update local fitness of C't^ */ 

M = l 

6: end for 

7: t -i^ t £ {ti,t2, ...yti} such that /* index of the fittest clause */ 

/^(a)=max{/^(aj,/^(Q,),...,/^(CtJ} 

8: for J <— I to i do 

9: friCt,) ^ -JTTF\ I* normalize fitness */ 

10: et, < T ■ log friCt^ ) /* update energy level */ 

11: end for 

Algorithm S4 (Step II) - Connecting_First_two_Clauses-Nodcs 

1: i <- i + 1 

2: Find_Closest_Clausc() 

3: i7ti ^—1 /* Probability of linking the node v{Ct2) to the node v{Ct-i) * I 

4: E ^ EVJ {{v{Ct,),v{Ct^))} /* connect ^(CtJ to i;(Ct J */ 

5: kti -^ degree(i'(Ct^)) /* update connectivity of node v{Cti) * I 

6: fct2 -^ degree(i'(Ct2)) /* update connectivity of node ^(Cta) */ 

7: Update_Fitness() 
Algorithm S5 ChainSAT 

2: 

3: 

4: 

5: 

6: 

7: 

8: 

9: 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 



S = random assignment of values to the variables 
chaining = False 
while 5 is not a solution do 
if chaining — False then 

C = a clause not satisfied by S selected uniformly at random 
V — a, variable in C selected u.a.r. 
end if 

AE — change in the number of unsatisfied clauses if V is flipped in S 
chaining = False 
if Z\£ = then 

flip V in S 
else if Z\£ < then 

with probability pi do 

flip V inS 
end with 
else 

with probability 1 — P2 do 

C — a. clause satisfied only by V selected u.a.r. 
V' — a, variable in C other than V selected u.a.r. 
V = V' 

chaining = True 
end with 
end if 
end while 
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Solver Solved MaxSAT Flips 



ChainSAT 163 3213.27 177820 

LC-ChainSAT 163 3220.32 179233 



Table SI. Further comparison between LC-ChainSAT and ChainSAT on 171 3-SAT instances. Both 
algorithms solve the same number of instances, but once again LC-ChainSAT satisfies more clauses than 
ChainSAT. 



C S2G Examples 



In this section we present four graphs obtained as a result of the S2G algorithm. The plots in Figure SI 
highlight that the most connected nodes have the highest number of particles, and the winner node is 
identified with the lowest energy level. These facts help us confirm that when a Bose-Einstein condensation 
occurs there is a clear mapping between the graph derived by the S2G algorithm and the Bose gas at low 
temperatures. 



3-SAT BEC-instance with 10 clauses and 60 variables (Figure SI (a) and (b)) 



Co Ci C2 C3 C4 



-F = (^^59 V V55 V V52) A (V413 V V31 V V41) A (Vse V V44 V V18) A (V42 V Vig V ^37) A (V14 V V54 V V22) A 

C5 Cq C7 Cg C(j 



A (V40 V V52 V V27) A (V42 V V55 V V29) A (Vg V V53 V V39) A (V^g V V19 V V27) A (V34 V V25 V Vji) 



3-SAT BEC-instance with 20 clauses and 60 variables (Figure SI (c) and (d)) 



Co Ci C2 C3 C4 



F = (V50 V V55 V V52) A (V46 V V31 V V41) A (V5e V V44 V Vis) A (V42 V Via v V27) a (V14 v V54 V V22) a 

C5 Cq Cj Cg Cg 



, (V40 V V52 V V27) A (V42 V 1/55 V V29) A (Vg V V53 V Vgg) A (Vjg V V^g V V27) A (V34 V V25 V V]^ ^ ) A 



- (V30 V V46 V Vgo) A (V45 V V23 V Vis) A (Via V V44 V Vg) A (V5 V V5S V V4) A (V4g V V44 V V40) / 
Ci5 C16 Ci7 Cis Cig 



A (Vg VVg V V32) A (V2S V V50 V V35) A (Vgo V V13 V V54) A (V53 V V5 VVj ) A (Vjo V V24 V V55) 



3-SAT BEC-instance with 30 clauses and 60 variables (Figure SI (e) and (f)) 



Co Ci C2 C3 C4 



F = (V59 V V55 V V52) A (V46 V V31 VV41) A (V56 V V44 V Vig) A (V42 V Vio V V27) A (V14 V V54 V V22 ) A 
C5 Cq C7 Cg Cg 



' (V40 V V52 V V27) A (V42 V V55 V V29) A (Vg V V53 V V39) A (V48 V Vig V V27) A (V34 V V25 V V^i) , 
Cio Cii C12 C13 Ci4 



A (V30 V V46 V Vgo) A (V45 V V23 V Vis) A (Vio V V44 V Vg) A (V5 V V58 V V4 ) A (V48 V V44 V V40 ) A 
C15 C16 Ci7 C18 C19 



A (Vg VVg VV32) A (V2S V V50 V V35) A (Vgo V V13 V V54) A (V53 VV5 VVj ) A (Vjo V V24 V V55) , 
C20 C21 C22 C23 C24 



, (V35 V V2S V V42) A (Vig V Vgo V V35) A (V3 V V45 V V50 ) A (V59 V V45 V V55) A (Vjo V V50 V V54) A 
C25 *^26 ^27 '^28 ^^29 



, (V38 V Vi4 V V43) A (V5 V V3S V V14) A (Vi7 V V33 V V44) A (V57 V Vag V V21) A (V21 V V36 V V50) 
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Fig. SI. 3-SAT S2G graph and Bose gas. We show the graphs derived by a sample instance with 10 (a), 
20 (c), 30 (e) clauses, and the energy levels (b), (d), (f) identified by the S2G algorithm. In these figures, 
when two lines are close to each other, it means they are two distinct degeneration states of the same 
energy level. For instance, in (b) the clauses Cg, C3, C\ and C5 represent four degeneration states of 
the same energy level. In (d) and (f), degeneration states without particles are omitted. In (e), isolated 
vertices are omitted. The weight on each edge represents the probability that the corresponding link is 
established. The node with the lowest energy gets the highest number of particles (BEG phase). 
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3-SAT FGR- instance with 20 clauses and 20 variables (Figure S2): 

Co Ci C2 Ca 



Ci 



-F = (^5 V V3 V V17) A (V3 V V20 V V5) A (Ve V Vi3 V Vji) A (Vig V Vji V Vg) A (Vjy V Vjg V V^) A 
C5 Cg C7 Cg Cg 



A (V4 V Vi4 V Vis) '^ (^10 V V2 V V5) A (Vg V V^ V Vg ) A (Vji V Vi V Vg) A ( Vg V Vj 5 V V13) A 
Cin Ci 1 C12 Ci3 Ci4 



> (Vg V Vis V V17) A (Vg V Vi4 V V20) A (Vg V Vjg V Vg ) A (Vjo V V5 V V20 ) A (V13 V Vg V Vg ) / 
C15 C16 017 ClS Cig 



A (V5 V V4 V Ve) A (Vig V V3 V Vio) A (V14 V Vg V Vjs) A (V12 V V5 V V4 ) A ( V4 V Vj 5 V V2 ) 





(b) 

Fig. S2. 3-SAT S2G graph and fit-get-rich phase. In (a) we show the graph derived by a sample instance 
with 20 clauses, and in (b) the energy levels identified by the S2G algorithm. When two lines are close 
to each other, it means they are two distinct degeneration states of the same energy level. For instance, 
the clauses C13 and C15 represent two degeneration states of the same energy level. Degeneration states 
without particles are omitted. The weight on each edge represents the probability that the corresponding 
link is established. In this example a fit-get-rich (FGR) phase happens, consisting of three hubs Cg, C13 
and Cig. 



D S2G-PA Examples 
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3-SAT BEC-instance with 10 clauses and 30 variables (Figure S3 (a) and (b)) 



Co 



F = (V26 V Vig V Vg) A (V30 VVg V V24) A (V27 VVg V Vj j ) A (V29 V Vjg V V3) A ( Vi 1 VV2 VVj ) A 
C5 Cg C7 Cg Cg 



A (V30 V V28 V V213) A (V23 V Vie V V'21) A (V2S V V22 V Vg) A (V21 VV5 V V14) A ( V7 V V27 V Vj j ) 



3-SAT BEC-instance with 20 clauses and 30 variables (Figure S3 (c) and (d)) 



Co 



F = (V26 V Vig V Vg) A (V30 V Vg V V24) A (V27 V Vg V Vj j ) A (V29 V Vjg V V3 ) A ( Vj j VVj V Vj ) , 
C5 Cq Cj Cg Cg 



> (V3O V V2S V V26) A (V23 V Vie V V21) A (V28 V V22 V Vg) A (V21 V V5 V V14) A (V7 V V27 V V14) A 
Cin Ci 1 Ci2 C^■:i Ci4 



> (V20 V V2e V V14) A (V21 V V2S V V15) A (V5 V V27 V V20) A (V24 V Vio V V14) A (Vi7 VVi3 V Vg) A 
C15 C16 Ci7 Cis Cig 



A (V15 V V23 V V30) A (V23 V V12 V Vg) A (V5 V V22 V V4) A (V3 V V2g V V2 ) A (V24 V V22 V V20) 



3-SAT BEC-instance with 30 clauses and 30 variables (Figure S3 (e) and (f)) 



Co 



C4 



f = (^26 V ^19 V Vg) A (V30 V Vg V V24) A (V27 V Vg V Vii) A (Vjg V Vig V V3) A (V41 VV2 V Vi) A 
C5 Cq Cj Cg Cg 



A (V30 V V2S V V26) A (V23 V Vie V V21) A (V28 V V22 V Vg) A ( Vj i V V5 V V14) A ( V7 V V27 V Vj 4 ) A 
Cio Cii C12 Ci3 Ci4 



A (V20 V V2fi V V14) A (V21 V V2S V V15) A (V5 V V27 V V20) A (V24 V Vjg V V14) A (V17 V V13 V Vg ) A 
Cir. Cifi C17 Cio Cig 



A (V45 V V23 V V30) A (V23 V V42 V Vg) A (V5 V V22 V V4) A (V3 V V29 V V2 ) A (V24 V V22 V V20 ) A 
C20 C21 C22 C23 C24 



A (V3 VV5 V Vis) A (Vi4 V V25 V Vig) A (V30 V V7 V V27) A (V27 V V3 V Vj ) A ( V5 V Vj 2 V V28 ) A 

C25 C26 C27 C28 C2g 

A (V18 V Vn: V Val) A (Vj^ V V35" V V]^) A (V2 V V2J V Vji") A (V^ V V23 V V28) A (V5 V V2^ V V27) 
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Fig. S3. 3-SAT S2G-PA graph and Bose gas. Wc show the graphs derived by a sample instance with 10 
(a), 20 (c), 30 (e) clauses, and the energy levels (b), (d), (f) identified by the S2G-PA algorithm. Thanks 
to the fitness-based preferential attachment, there are no isolated vertices, thus every degeneration state 
is populated by at least one particle. These examples show an almost complete BEG. 



E Standard Deviation of Non- Winner Degree Distribution 
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Fig. S4. Standard deviation of the non-winner degree distribution. Each point is an average over 100 
3-SAT instances with 30 graphs per instance. The distribution taken into account is the standard degree 
distribution except for the winner node, not considered in this analysis. Both the mean and the standard 
deviation increase as a increases, numerically showing the emergence of new hubs. 
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F Typical S2G-PA Networks 





a = l 



a = 3 





a = 4.2 



a = 5 




a = 6 




Fig. S5. S2G-PA typical output networks with 10 variables. The probability of undergoing a full BEG 
increases with decreasing a. When a = 1, the network obtained by the S2G-PA algorithm features a 
BEG regardless of the SAT instance given as input. When a ~ 10, the typical network shows a partial 
condensation on the winner node with other growing hubs connected to it. 



